========================================================

Prosper Marketplace is America’s first peer-to-peer lending marketplace, with over $7 billion in funded loans. Borrowers request personal loans on Prosper and investors (individual or institutional) can fund anywhere from $2,000 to $35,000 per loan request. Investors can consider borrowers’ credit scores, ratings, and histories and the category of the loan. Prosper handles the servicing of the loan and collects and distributes borrower payments and interest back to the loan investors.

Prosper verifies borrowers’ identities and select personal data before funding loans and manages all stages of loan servicing. Prosper’s unsecured personal loans are fully amortized over a period of three or five years, with no pre-payment penalties. Prosper generates revenue by collecting a one-time fee on funded loans from borrowers and assessing an annual loan servicing fee to investors.

load the prosper loan dataset

## 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : chr  "1021339766868145413AB3B" "10273602499503308B223C1" "0EE9337825851032864889A" "0EF5356002482715299901A" ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : chr  "2007-08-26 19:09:29.263000000" "2014-02-27 08:28:07.900000000" "2007-01-05 15:00:47.090000000" "2012-10-22 11:02:35.010000000" ...
##  $ CreditGrade                        : chr  "C" "" "HR" "" ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : chr  "Completed" "Current" "Completed" "Current" ...
##  $ ClosedDate                         : chr  "2009-08-14 00:00:00" "" "2009-12-17 00:00:00" "" ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : chr  "" "A" "" "A" ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : chr  "CO" "CO" "GA" "GA" ...
##  $ Occupation                         : chr  "Other" "Professional" "Other" "Skilled Labor" ...
##  $ EmploymentStatus                   : chr  "Self-employed" "Employed" "Not available" "Employed" ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : chr  "True" "False" "False" "True" ...
##  $ CurrentlyInGroup                   : chr  "True" "False" "True" "False" ...
##  $ GroupKey                           : chr  "" "" "783C3371218786870A73D20" "" ...
##  $ DateCreditPulled                   : chr  "2007-08-26 18:41:46.780000000" "2014-02-27 08:28:14" "2007-01-02 14:09:10.060000000" "2012-10-22 11:02:32" ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : chr  "2001-10-11 00:00:00" "1996-03-18 00:00:00" "2002-07-27 00:00:00" "1983-02-28 00:00:00" ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : chr  "$25,000-49,999" "$50,000-74,999" "Not displayed" "$25,000-49,999" ...
##  $ IncomeVerifiable                   : chr  "True" "True" "True" "True" ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : chr  "E33A3400205839220442E84" "9E3B37071505919926B1D82" "6954337960046817851BCB2" "A0393664465886295619C51" ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : chr  "2007-09-12 00:00:00" "2014-03-03 00:00:00" "2007-01-17 00:00:00" "2012-11-01 00:00:00" ...
##  $ LoanOriginationQuarter             : chr  "Q3 2007" "Q1 2014" "Q1 2007" "Q4 2012" ...
##  $ MemberKey                          : chr  "1F3E3376408759268057EDA" "1D13370546739025387B2F4" "5F7033715035555618FA612" "9ADE356069835475068C6D2" ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...
  • This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information.

Univariate Plots Section

  • Explore borrower related variables and their characteristics.

  • What is the term chosen by borrowers?

## 
##    12    36    60 
##  1614 87778 24545

Plotting loans’ term

  • 36 months seems to be the most common term chosen by borrowers.

  • Now we will explore Loan Originating Quarter!

  • To draw a better conclusion, I’m going to check LoanOrigination on a yearly basis.

  • From the graph it seems that there is a dip in 2009 and from them the number of loans started to increase.
##      Number Of Borrowers Percentage
## 2005                  22       0.02
## 2006                5906       5.18
## 2007               11460      10.06
## 2008               11552      10.14
## 2009                2047       1.80
## 2010                5652       4.96
## 2011               11228       9.85
## 2012               19553      17.16
## 2013               34345      30.14
## 2014               12172      10.68
  • As we can see in this table it is clear that after the dip in 2009, the number of borrowers increased drasitically.It was 1.8% in 2009 and in 2013 it is 30%.

  • Next we will see what range of interest rates prosper loans are offering.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1340  0.1840  0.1928  0.2500  0.4975

plotting BorrowerRate

  • It seems that Borrower Rate ranges from 0 to 0.5. For most of the borrowers, interest rate is less than 0.25. It is also interesting to see that some borrowers have zero interest rates.

  • Let’s check the number of borrowers with zero interest rates.

## [1] 8
  • There are 8 people with zero borrower rates. But I could not understand why these people were given a special offer. May be they are of some interest for lenders because before 2009 lenders determine the interest rates, and all these loans were originated before 2009.

  • Now we will explore what levels of prosper ratings are available and what is the most common rating given to borrowers!

## [1] "AA" "A"  "B"  "C"  "D"  "E"  "HR" ""
## 
##    AA     A     B     C     D     E    HR       
##  5372 14551 15581 18345 14274  9795  6935 29084

plotting prosper ratings for borrowers

  • The shape of distipution seems like a bell shaped curve and the most common prosper ratings are A,B,C, and D.

  • Let’s check What purpose borrowers are taking loans for?!

## 
##               Auto      Baby&Adoption               Boat 
##               2572                199                 85 
##           Business Cosmetic Procedure Debt Consolidation 
##               7189                 91              58308 
##    Engagement Ring        Green Loans  Home Improvements 
##                217                 59               7433 
## Household Expenses    Large Purchases     Medical/Dental 
##               1996                876               1522 
##         MotorCycle      Not Available              Other 
##                304              16965              10494 
##      Personal Loan                 RV        Student Use 
##               2395                 52                756 
##              Taxes           Vacation      Wedding Loans 
##                885                768                771

Plotting Listing Category

  • Here I created a new variable “ListingCategory..string”. Instead of displaying a number for listing category, this variable will display full name.
  • From the graph, we can see that majority are taking loan for Debt Consolidation. The second most category is for the purpose of Business and Home Improvements.

  • Lets Explore the geographical distribution for borrowers.

Plotting BorrowerState

  • Prosper is a California based company. That might be the reason that there are more loans originated in this state.
  • Next mostly used states are FL, GA, IL, NY, and TX.

  • Exploring the range of loan amounts borrowers are requesting.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000

  • The shape of distribution is positively skewed. Minimum loan amount is 1000 and maximum is 35000. Third quartile is 12000. There is a big difference between Q3 and the max amount.

  • Let’s check how the graph will change when x limits are from 0 to 95%!

  • It seems that the majority of loans are less than 10,000.

  • Now We will check borrowers’ stated monthly income.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3200    4667    5608    6825 1750003

. There seems to be an Outlier.

. I will change the x limits to see the graph closely.

. People who have less monthly income are more likely to take loans. It is also interesting to see that there are people with zero monthly income. Even though, they managed to get the loan.

. Let’s check the number of people who got loans with zero income.

## [1] 1394

. Total of 1394 people got loans with zero income. This group holds people with listing creation date after and before 2009. So there is no chance to think that thay are of some interest to lenders. It is interesting to see that all these people come under zero income or not employed. May be they have shown some property to get the loan or they are doing some other kind of job that doesn’t come in the category of monthly income.

. Next looking into the income range graph.

## 
##             $0      $1-24,999      $100,000+ $25,000-49,999 $50,000-74,999 
##            621           7274          17337          32192          31050 
## $75,000-99,999  Not displayed   Not employed 
##          16916           7741            806

plotting Income range

. Most people with the income range from 25,000-74,999 took loans.

. Let’s look into the debt to income ratio graph.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8554

. To get a clear graph we will take the 99 percentile.

##  50%  90%  99% 
## 0.22 0.42 0.86

. Now the graph seems to be much better. Almost 99% of the Debt to income ratio is less than 0.86. This is a good number because people cannot pay all of their income for their loan payments.

. Let’s investigate the number of people which thier debt to income ratio is greater than 1!

## 
##  FALSE   TRUE 
## 104584    799

. 799 people took risk. Their debt to income ratio is greater than 1.

. Let’s look into their loans’ status.

. Most of the people were able to complete their loans. It means they are having other kind of income resources.

. As prosper is a peer-to-peer company. Now we will see how many investors are funding loans!

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    2.00   44.00   80.48  115.00 1189.00
## 
##     1 
## 27814

. This is the graph for investors more than 1.

. Almost 27814 borrowers have only 1 investor.

. Now we will see lender yield.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.0100  0.1242  0.1730  0.1827  0.2400  0.4925

## [1] 22

. Out of 113937 loans, these are only 22 cases where lender got loss. Mean lender yield is 0.1827

Univariate Analysis

What is the structure of your dataset?

This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information.

  1. Prosper offers terms of 12,36, and 60 in this dataset. 36 seem to be the most common term.
  2. Prosper rating ranges from “AA”,“A”,“B”,“C”,“D”,“E”,“HR”,“NA”. AA is the best and the rating decreases as it goes to E. Common prosper ratings are A, B, C, & D
  3. Borrower Rate ranges from 0 to 0.5
  4. Loan original amount is raging from 1000 to 35,000
  5. Loans are taken for the purpose of i.e. listing category: “Not Available”, “Debt Consolidation”, “Home Improvements”, “Business”, “Personal Loan”, “Student Use”, “Auto”, “Other”, “Baby&Adoption”,“Boat”, “Cosmetic Procedure”, “Engagement Ring”, “Green Loans”, “Household Expenses”, “Large Purchases,”Medical/Dental“,”MotorCycle“,”RV“,”Taxes“,”Vacation“,”Wedding Loans". Most loans are taken for Debt Consolidation.
  6. Income range levels are “$0”,“Not employed”,“$1-24,999”, “$25,000-49,999”,“$50,000-74,999”, “$75,000-99,999”,“$100,000+”. Most people from income range $25,000-74,999 took loans.
  7. Most of loans originated in California State. Next mostly used states are FL,GA,IL,NY and TX.
  8. Almost 99% of Debt to income ratio is less than 0.86
  9. For most of the loans, number of investors is 1
  10. Mean lender yield is 0.1827

What is/are the main feature(s) of interest in your dataset?

Prosper rating, interest rates, term, loan original amount seems to the main feature. I am planning to see how these factors are inter-related and how other factors are influencing them.

What other features in the dataset do you think will help support your into your feature(s) of interest?

Analyzing credit score, employment status, income range, stated monthly income, loan category, and so on can help better understand main factors. .

Did you create any new variables from existing variables in the dataset?

I created two variables

  1. A new variable named ListingCategory..string. There is a variable ListingCategory..numeric that contain numbers ranging from 0-20. For better analysis, I have created ListingCategory..string that holds the category names such as “Debt Consolidation”, “Home Improvements”, “Business”, “Personal Loan”,“Student Use”, “Auto” and so on.

  2. Second variable is LoanOriginationYear. There is a variable named LoanOriginationQuarter. For better analysis I have combined quarters into their respective years. For example (Q1 2005,Q2 2005,Q3 2005, Q4 2005 into 2005).

Bivariate Plots Section

Here, I setup a dataframe that contains variables that are of interest to further analyze.

. This graph shows correlation between different variables.

. Now We will see the relationship between borrower rate and prosper rating

. Borrower’s rate is highly dependent on proper rating. We can see that interest rate is increasing as prosper rating decreasing. AA is top rating and HR is lowest.

. Now We will analyze on what basis prosper rating is given!

. It seems that employment status plays a role in determining prosper rating. Employed borrowers must have a better proper rating than not employed.

. We will see how income range influence prosper rating.

. It is clear that as income range is more prosper rating is better. That’s because they are comfortable to pay their debts on time.

. We will see how credit score influence prosper rating.

. Credit score influences prosper rating. As credit score is increasing prosper rating is improving.

## ProsperRating..Alpha.: AA
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   680.0   740.0   780.0   774.1   800.0   880.0 
## -------------------------------------------------------- 
## ProsperRating..Alpha.: A
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   640.0   700.0   720.0   729.9   760.0   880.0 
## -------------------------------------------------------- 
## ProsperRating..Alpha.: B
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   600.0   680.0   700.0   706.9   740.0   860.0 
## -------------------------------------------------------- 
## ProsperRating..Alpha.: C
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   600.0   660.0   680.0   689.9   720.0   880.0 
## -------------------------------------------------------- 
## ProsperRating..Alpha.: D
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   600.0   660.0   680.0   680.3   700.0   860.0 
## -------------------------------------------------------- 
## ProsperRating..Alpha.: E
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   600.0   640.0   660.0   662.5   680.0   860.0 
## -------------------------------------------------------- 
## ProsperRating..Alpha.: HR
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     600     660     680     677     700     860

. We can see how the mean credit score is decreasing as the proper rating is decreasing. It seems taht there is a strong relationship between these two.

. Now we will see what factors influence credit score.

## 
##  Pearson's product-moment correlation
## 
## data:  CreditScoreRangeLower and CurrentCreditLines
## t = 46.809, df = 106330, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1361976 0.1479760
## sample estimates:
##       cor 
## 0.1420918

. The more credit lines, the better credit score.

## 
##  Pearson's product-moment correlation
## 
## data:  CreditScoreRangeLower and TotalInquiries
## t = -96.631, df = 112780, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2819071 -0.2711270
## sample estimates:
##        cor 
## -0.2765257

The lesser the inquiries, the better the credit score.

## 
##  Pearson's product-moment correlation
## 
## data:  MonthlyLoanPayment and CreditScoreRangeLower
## t = 102.99, df = 113340, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2871995 0.2978465
## sample estimates:
##      cor 
## 0.292532

. The larger the loan payment, the better the credit score.

## 
##  Pearson's product-moment correlation
## 
## data:  BorrowerRate and CreditScoreRangeLower
## t = -175.17, df = 113340, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4661358 -0.4569730
## sample estimates:
##        cor 
## -0.4615667

. Good interest rates for higher credit score. . Now we will see how monthly income, term and loan original amount are influenced by different factors!

## 
##  Pearson's product-moment correlation
## 
## data:  StatedMonthlyIncome and MonthlyLoanPayment
## t = 67.764, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1912423 0.2024055
## sample estimates:
##       cor 
## 0.1968303

. People who have more income are taking higher loans.

## 
##  Pearson's product-moment correlation
## 
## data:  StatedMonthlyIncome and LoanOriginalAmount
## t = 69.353, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1956816 0.2068243
## sample estimates:
##       cor 
## 0.2012595

. The higher the income, the higher the loan amount taken.

## 
##             $0      $1-24,999 $25,000-49,999 $50,000-74,999 $75,000-99,999 
##            621           7274          32192          31050          16916 
##      $100,000+ 
##          17337

. But as the income increases, number of people taking loan is decreasing. Is seems right because people with higher income will be self-sufficient and they may be do not need personal loans.

. Employed seems to get higher loan amounts.

. People are taking higher loan amounts for debt consolidation and baby&adoption.

. Now we will see for what purpose people are taking loans, when loan origination year comes into picture.

. Majority of loans are originated in years 2012-2014. It seems in earlier years people have not taken personal and Student Use loans.

. Borrowers can get higher loans when they choose to payoff in more years.

. Term has influence over borrower rate.

## 
##  Pearson's product-moment correlation
## 
## data:  LoanOriginalAmount and BorrowerRate
## t = -117.58, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3341283 -0.3237719
## sample estimates:
##        cor 
## -0.3289599

. As loan amount increases, interest rates seem to be reasonable.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the

investigation. How did the feature(s) of interest vary with other features in
the dataset?

Borrower rate is determined by prosper rating, credit score, loan original amount, and term. And there is a strong relationship between Borrower rate and credit score with R^2 -0.46. In turn, credit score is influenced by total inquiries, credit lines and monthly loan payments. And Loan original amount is influenced by term, employment status and listing category.

Did you observe any interesting relationships between the other features

(not the main feature(s) of interest)?

  1. Larger loan payment, lesser inquiries, and more credit lines results in better credit score.
  2. People who earn more are likely to take more loan amount. But as the income increases, number of people taking loans is decreasing.
  3. Employment status has influence upon loan amount. Employed seems to have opportunity to apply for higher loan amounts.
  4. Borrowers can get higher loans when they choose to payoff in more years.
  5. Interest rates are reasonable for higher loan amount.
  6. People are taking higher loan amounts for debt consolidation and baby&adoption.

What was the strongest relationship you found?

There is strong relationship between borrower rate and credit score with R^2 -0.46. In turn, there is a strong relation between credit score and prosper rating.

Multivariate Plots Section

. In this section, we will see how main factors are inter related.

. At the same level of prosper rating and credit score, higher the term implies borrowers have chance to apply for higher loan amount.

. We will see whether income influence loan amount. In bivariate analysis, we have seen that loan original amount and stated monthly income are related by R^2 of 0.2.

. Now we will see how they behave when term comes into the picture.

. Borrowers who have good prosper rating have an opportunity to avail lower borrower rates and at the same time, they can take higher loans.

. Even if income earning are low, people have opportunity to take higher loan amounts when they choose to pay off in 5years. It seems reasonable because borrowers will have affordable monthly loan payments and their debt to income ration will be much more less than 1.

. Overall, all kinds of employment statuses can get higher loans but they have to choose higher term. But in the graph, we can definitely see that those who are employed are borrowing much more loan amount than others in each term group.

. We will see graph for loan original amount Vs income range.

. In this case also, borrowers can take higher loans when they are willing to pay in more number of terms and they are earning more.

. In bivariate analysis, we have seen that higher loan original amount have better interest rates and they are related by R^2 of -0.33. But when term comes into picture, interest rates are a little higher.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the . Were there features that strengthened each other in terms of at your feature(s) of interest?

In spite of the different levels of credit score, proper rating, employment status, and monthly income borrowers have opportunity to take higher levels of loan amounts. But they have to choose to payoff in more number of terms.

Were there any interesting or surprising interactions between features?

People who have more income are likely to take higher loan amount. When I further analyzed loan original amount with respect to borrower rate. People can borrower more money but when term comes into picture, interest rates are little higher.


Final Plots and Summary

Plot One

Description One

Borrowers who have good prosper rating have an opportunity to avail lower borrower rates and at the same time, they can take higher loans. People who have lower proper rating cannot take higher loans like $30,000 and they have to pay higher borrower rates even for less loan amounts. This trend seems quite normal because lenders are taking risk of giving loans to people who have bad prosper rating. So, lenders should get some benefit of higher interest rates. It seems similar to the stock market if one takes the risk they might get huge profit or loss.

Plot Two

Description Two

From this Boxplot it is clear that borrowers can take higher loans when they are willing to pay in more number of terms and they are earning more. And prosper is also making sure that even for people who are taking higher loan amounts have debt to income ration less than 1.

Plot Three

Description Three

Some insights that can be drawn from this graph are.

  1. Not Available information in Loan Category was originated before 2008.
  2. Majority of loan categories like taxes, vacation, wedding loans, motorcycle, boat, cosmetic procedures and so on are originated after 2010.
  3. Majority of personal and student loans were originated in year 2008 and 2009. After these years, people are not taking loan in those categories.

It seems like people way of living has changed a lot since 2010. If we have much more data available to analyze then it is possible to come to a clear conclusion regarding living styles.


Reflection

. The data set had nearly 114,000 loans from Nov 2005 - March 2014. After 2009 number of loans drastically increased. Prosper also changed its business model from 2009 and this might have attracted many borrowers.

. Before lenders used to determine borrower rate and now depending on credit risk prosper will fix interest rates. Many interesting insights can be drawn from this data. Initially, I was very confused by too many variables but as time progressed, I think I got some hang of these variables. It is also surprising to see that the purpose for which people are taking loans for has changed drastically over years.

. I think that a lot can be analyzed using this data like why some people are not able to pay loan on time, what is determining interest rates, what reasons are making people take loans and so on.